Discourse-Level Annotation For Investigating Information Structure

نویسندگان

  • Ivana Kruijff-Korbayova
  • Geert-Jan M. Kruijff
چکیده

We present discourse-level annotation of newspaper texts in German and English, as part of an ongoing project aimed at investigating information structure from a cross-linguistic perspective. Rather than annotating some specific notion of information structure, we propose a theory-neutral annotation of basic features at the levels of syntax, prosody and discourse, using treebank data as a starting point. Our discourse-level annotation scheme covers properties of discourse referents (e.g., semantic sort, delimitation, quantification, familiarity status) and anaphoric links (coreference and bridging). We illustrate what investigations this data serves and discuss some integration issues involved in combining different levels of stand-off annotations, created by using different tools.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

QUD-Based Annotation of Discourse Structure and Information Structure: Tool and Evaluation

We discuss and evaluate a new annotation scheme and discourse-analytic method, the QUD-tree framework. We present an annotation study, in which the framework, based on the concept of Questions under Discussion, is applied to English and German interview data, using TreeAnno, an annotation tool specially developed for this new kind of discourse annotation. The results of an inter-annotator agree...

متن کامل

Using A Probabilistic Model Of Discourse Relations To Investigate Word Order Variation

Like speakers of any natural language, speakers of English potentially have many different word orders in which to encode a single meaning. One key factor in speakers’ use of certain non-canonical word orders in English is their ability to contribute information about syntactic and semantic discourse relations. Explicit annotation of discourse relations is a difficult and subjective task. In or...

متن کامل

Exploiting Semantic Information For Manual Anaphoric Annotation In Cast3LB Corpus

This paper presents the discourse annotation followed in Cast3LB, a Spanish corpus annotated with several information sources (morphological, syntactic, semantic and coreferential) at syntactic, semantic and discourse level. 3LB annotation scheme has been developed for three languages (Spanish, Catalan and Basque). Human annotators have used a set of tagging techniques and protocols. Several to...

متن کامل

Towards interoperable discourse annotation. Discourse features in the Ontologies of Linguistic Annotation

This paper describes the extension of the Ontologies of Linguistic Annotation (OLiA) with respect to discourse features. The OLiA ontologies provide a a terminology repository that can be employed to facilitate the conceptual (semantic) interoperability of annotations of discourse phenomena as found in the most important corpora available to the community, including OntoNotes, the RST Discourse...

متن کامل

A Framework For Annotating Information Structure In Discourse

We present a framework for the integrated analysis of the textual and prosodic characteristics of information structure in the Switchboard corpus of conversational English. Information structure describes the availability, organisation and salience of entities in a discourse model. We present standards for the annotation of information status (old, mediated and new), and give guidelines for ann...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004